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ABSTRACT 


Two  centralized  Air  Force  systems  —  one  dealing  with  finance  and 
one  with  personnel  assignment  —  were  used  to  study  applications  of 
large-scale  computers.  A  generalized  time-sharing  computer  system  was 
modeled  and  simulation  was  made  to  measure  query  response  time  for 
various  hypothetical  conditions . 
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SECTION  I 


INTRODUCTION 


During  the  past  year,  the  Information  Systems  Operation  of  the  General 
Electric  Company  has  conducted  a  series  of  studies  pertaining  to  the  applica¬ 
tion  of  large-scale  computer  systems  to  information  storage  and  retrieval 
Specifically  two  Air  Force  functions  were  examined  to  determine  the  feasi¬ 
bility  of  centralizing  the  tasks  at  a  computer  center  with  remote  access  , 

Several  aspects  are  important  in  the  specification  of  a  large  computer 
system  When  the  processing  load  is  great  due  to  bulk  of  data  or  the  fre¬ 
quency  of  querying,  it  is  necessary  that  prime  capability  be  efficiently  util¬ 
ized  among  tasks  As  possible  aids  in  the  design  of  this  type  of  system  , 
these  studies  pursue  gains  in  efficiency  made  possible  by  both  (1)  data  or¬ 
ganization  techniques,  and  (2)  on-line  time  sharing  of  data  and  processing 
equipment . 

The  applications  examined:  (1)  an  overall  pay  system,  and  (2)  a  system 
to  aid  in  the  assignment  of  personnel  to  jobs  ,  proved  interesting  in  their  de¬ 
mands  upon  large-scale  data -handling  and  manipulation  capabilities.  Since 
hypothetical  requirements  were  frequently  imposed  on  the  systems  to  facili¬ 
tate  the  studies  ,  specific  conclusions  should  not  be  drawn  about  either  system 

The  hypothetical  Air  Force  pay  system  requires  on-line  querying  by  a 
large  number  of  users  of  an  extensive  and  highly  dynamic  data  base.  The 
matching  of  men  and  jobs  by  computer  requires  an  assignment  algorithm  which 
will  make  consistent,  acceptable  assignments  using  the  broadest  spectrum  of 
candidates.  The  personnel  files  used  in  assignment  are  also  large  and  fre¬ 
quently  updated.  The  timely  communication  and  acceptance  of  updates  add  a 
significant  load  to  both  study  systems  . 

These  Air  Force  applications  are  examples  of  the  classes  of  functions 
which  can  be  handled  using  on-line,  remote -inquiry  computer  systems.  Ex¬ 
amination  of  these  tasks  provided  a  basis  for  constructing  a  generalized  time- 
shared  computer  system  model,  which  is  the  third  in  a  series  of  studies. 
Simulation  of  the  general  model  was  performed  to  measure  the  adequacy  of 
configurations  for  a  set  of  hypothetical  requirements  . 

Feasibility  of  both  the  pay  and  man-job  match  systems  was  shown  and 
each  was  examined  as  a  time-sharing  type  of  application.  The  generalized 
time -sharing  model  showed  centralization  of  all  computational  power  to  be 
more  economical  than  distributing  logical  capability  to  remote  stations  . 

Three  supporting  analytic  studies  were  performed  and  are  contained  in 
Appendixes  to  avoid  unnecessary  detail  in  the  body  of  the  report.  The  first 
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analytical  study  deals  with  a  means  for  partitioning  a  large  file  to  permit, 
in  some  cases,  greatly  reduced  searching  times.  The  second  deals  with  a 
mathematical  model  for  a  time -shared  computer  system  which  allows  for 
analytical  calculation  of  processing  times  at  each  terminal  as  a  function  of 
system  loading.  The  third  investigates  three  computational  algorithms  for 
performing  man-job  match  calculations.  Estimates  of  processing  times  are 
given,  and  the  methods  compared. 

The  following  sections  address  each  study  in  detail.  Specific  results, 
conclusions,  and  recommendations  are  presented  with  each  study.  The 
three  supporting  analytic  studies  are  referenced  when  appropriate. 
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SECTION  II 


STUDY  OF  A  CENTRALIZED  AIR  FORCE  PAY  SYSTEM 


1  Introduction 


This  section  is  devoted  to  a  study  to  determine  the  feasibility  of  de¬ 
signing  an  Air  Force  Centralized  Pay  System  involving  multiple  users  on  a 
real-time  basis.  The  principal  requirement  is  that  of  responding  to  rapid 
querying  of  an  extremely  large  file.  The  system  load  of  more  than  one  query 
per  second  of  a  file  whose  size  is  approximately  2  billion  characters  rules 
out  more  conventional  tape  or  drum  configurations.  This  system  is  character¬ 
istic  of  many  systems  such  as  the  following:  1)  personnel  systems  for  keep¬ 
ing  financial  records  ,  2)  logistics  system  for  keeping  track  of  parts  and/or 
reliability  histories  ,  and  3)  large  document  storage  and  retrieval  systems  . 
Only  the  very  recent  market  has  seen  the  emergence  of  devices  for  randomly 
storing  billions  of  characters  at  a  cost  per  character  that  does  not  greatly 
exceed  tape  storage  costs.  The  following  paragraphs  describe  the  charac¬ 
teristics  of  the  Air  Force  pay  system  which  was  used  for  study,  present  vari¬ 
ous  configurations  which  use  the  storage  devices  examined,  and  offer  recom¬ 
mendations  for  the  implementation  of  such  a  system. 

2 .  Characteristics  of  Study  System 


The  pay  system  has  two  functions  —  computing  and  authorizing  pay  for 
all  military  personnel,  and  compiling  and  presenting  statistics  on  fund  allo¬ 
cations  o  There  are  two  principal  files  —  a  master  file  with  one  record  for 
each  individual,  and  the  Summary  File  composed  of  fund  totals.  Attention 
focuses  on  the  processing  required  by  the  incoming  queries  since  it  is  this 
load  which  taxes  the  equipment  most.  Report  generation  and  other  routine 
and  regular  processing  can  be  considered  to  be  performed  "off-line"  as  far  as 
the  system  is  concerned  since  there  is  no  rigorous  time  constraint  —  night 
shifts  can  be  used. 

There  are  many  users  querying  the  computer  system  at  the  rate  of 
58,000  questions  per  day.  (See  Figure  1  )  Each  one  of  these  requires  re¬ 
trieval  from  the  2  billion-character  master  file  of  one  personnel  record, 
averaging  2,000  characters. 

Eighty-one  percent  of  these  or  47,000  make  some  change  to  the  indiv¬ 
idual's  record  so  that  it  must  be  read  back  into  storage  in  its  corrected  form. 
The  remaining  11,000  daily  queries  do  not  alter  the  data  and  no  second  ac¬ 
cess  is  needed.  It  is  assumed  that  every  change  which  is  made  to  a  person¬ 
nel  record  will  also  cause  a  change  in  the  Summary  File  in  order  to  update 
the  statistics  compilation.  This  update  will  also  require  two  accesses  of 
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Figure  1 .  Military  Pay  System 


the  storage  device  so  that  the  corrected  entry  can  be  recorded.  Based  on 
this  load,  the  bulk  storage  device  will  see  105,000  search  requests  in  the 
course  of  one  day-shift.  In  addition,  it  will  be  accessed  94,000  times 
when  the  specific  location  has  been  previously  determined.  These  "second" 
accesses  occur  when  a  corrected  record  is  ready  to  be  read  back  into  place 
in  the  master  file.  In  the  case  of  using  a  drum  or  disc  as  the  storage  medium, 
one-half  revolution  time  is  the  average  access  time  for  "second"  accesses. 

It  is  also  assumed  that  only  limited  computational  capability  is  re¬ 
quired  to  perform  these  operations .  The  standard  programs  needed  to  compute 
pay  under  a  variety  of  options  are  straightforward  and  are  assumed  to  be  held 
in  core  at  all  times  and  do  not  put  additional  searching  load  on  the  bulk  stor¬ 
age  devices.  Assuming  that  the  average  number  of  instructions  executed  per 
query  is  less  than  3,000,  the  average  processing  time  per  query  will  be  less 
than  10  milliseconds  for  the  range  of  central  processors  examined.  Those 
computers  offering  peripheral  storage  in  billions  of  characters  operate  within 
the  time  frame  given  above 


Since  this  is  a  feasibility  study,  the  aspects  of  buffering,  queue  size, 
and  overall  time  delays  were  not  explicitly  examined.  It  is  assumed  that  the 
arrivals  are  evenly  spaced  over  an  eight -hour  day  and  that  they  are  stored  in 
core  upon  arrival  until  the  processor  is  ready  to  answer  them. 

3  Alternative  Storage  Devices 


Characteristics  and  advantages  of  various  bulk  storage  devices  are 
given  as  follows  .  ^  The  range  of  times  within  which  the  processing  load 
would  fall  is  presented.  Transfer  rates  are  given,  but  since  access  time  is 
the  overriding  time  constraint,  read  and  write  time  can  always  be  overlapped 
so  as  not  to  cause  additional  delay.  This  is  also  true  of  processing  time  as 
demonstrated  above  A  summary  of  equipment  characteristics  is  provided  in 
Figure  2 


3  . 1  Uni  vac  FASTRAND  Drum 


•  The  data  file  in  the  study  system  requires  two  subsystems 
for  a  total  of  2.11  billion  characters  stored  on  3  2  drum 
units  with  two  controllers 

•  The  transfer  rate  is  183  kc . 


1 .  Information  derived  from  appropriate  manuals  as  listed  in  References 
[  l]  through  [  10]  . 
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Figure  20  Summary  Chart 


•  It  is  possible  to  preposition  the  heads  so  that  the  average 
access  time  is  40  ms  as  opposed  to  92  ms  average  access 
time  without  prepositioning. 

•  The  maximum  access  time  required  for  an  eight -hour  load 

is  3 . 7  hours  assuming  no  overlap  and  no  prepositioning  ex¬ 
cept  in  the  case  of  "second"accesses  .  Assuming  that  the 
messages  arrive  so  that  prepositioning  can  be  used  to  the 
utmost,  the  minimum  access  time  for  the  load  is  1 . 1  hours. 
The  real  system  would  operate  somewhere  between  these 
extremes  . 

3 . 2  IBM  1302  Disc  Model  2 

•  Two  disc  subsystems  would  be  required  to  hold  the  data 
base.  Two  controllers  would  handle  10  disc  units  with  a 
total  of  2.34  billion  characters  of  storage. 

•  The  transfer  rate  is  184  kc. 

•  Since  each  disc  unit  has  four  access  mechanisms,  there 
can  be  a  maximum  of  40  simultaneous  search  operations. 

•  The  average  access  time  is  180  ms  but  simultaneous  search¬ 
ing  when  possible  reduces  the  effective  access  time. 

•  Allowing  "second"  accesses  to  take  70  ms,  and  first  ac¬ 
cesses  to  take  the  maximum  180  ms  ,  7  1  hours  would  be 
required  to  handle  an  eight -hour  load.  If  it  were  possible 
to  initiate  the  40  simultaneous  seeks  continually,  the  en¬ 
tire  request  load  could  be  answered  in  15  minutes  . 

3  3  IBM  2321  Data  Cell 

•  Five  23  21  systems  connected  to  one  7631  File  Control  will 
provide  2.5  billion  characters  of  storage. 

•  The  transfer  rate  is  7  2  kc. 

•  The  storage  medium  is  magnetic  strips  which  can  be  mounted 
for  reading  and  writing  and  then  stored  Depending  upon 
how  much  strip  action  is  required,  the  access  time  can  vary 
between  50  and  600  ms  . 

•  The  worst  case  value  is  derived  from  sequential  searching 
in  which  the  correct  strip  is  never  mounted.  This  maximum 
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value  is  eight  hours;  it  can  never  be  greater  than  eight  hours  , 
for  if  the  backlog  was  that  large,  some  natural  batching  would 
have  occurred  so  that  in  some  cases  there  would  be  more  than 
one  requested  record  per  strip.  Since  there  are  five  systems 
available  in  this  configuration,  the  strip  action  can  be  over¬ 
lapped  to  give  an  effective  average  access  time  of  120  ms. 
"Second"  accesses  would  be  roughly  67  ms  since  the  correct 
strip  would  always  be  mounted.  Maximum  use  of  the  over¬ 
lapped  searches  would  allow  the  entire  load  to  be  handled  in 
five  hours . 

•  Because  the  strip  handling  causes  wear,  there  is  a  finite  life¬ 
time  of  each  strip  meaning  that  replacement  and  maintenance 
must  be  provided. 

3.4  RCA  Model  3488 


•  Three  34  88  sixteen-magazine  units  under  control  of  one  380 
Channel  will  provide  2.04  billion  characters  of  storage  on 
magnetic  cards  . 

•  The  transfer  rate  is  80  kc 

O  Preselection  of  a  second  card  can  overlap  the  feed,  spin,  and 
return  of  the  currently  drum-mounted  card.  This  selection 
process  takes  a  constant  170  ms  so  that  preselection  can 
save  a  significant  portion  of  access  time. 

•  Access  time  varies  between  30  and  465  ms  depending  on  the 
amount  of  card  manipulation  required. 

•  Without  preselection,  accessing  for  an  eight-hour  request 
load  would  require  eight  hours  .  As  discussed,  accessing 
cannot  be  greater  than  eight  hours  without  finding  more  than 
one  desired  record  per  card  in  some  cases;  this  natural 
batching  would  be  possible  with  a  backlog.  Making  full  use 
of  preselection,  the  minimum  access  time  for  the  load  is  6.3 
hours  .  The  very  best  case  is  overlap  searching  in  all  three 
units  simultaneously  —  this  reduces  the  total  access  time  to 
2.5  hours.  All  "second"  accesses  would  require  30  ms 
which  is  half  the  drum  revolution  time. 

•  Card  handling  will  cause  wear  requiring  maintenance  and  re¬ 
placement.  Sample  failure  times  are  given  as  30  ,000  ex¬ 
tractions  of  a  card  or  100  ,000  continuous  revolutions  while 
mounted  on  the  drum. 
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3 . 5  General  Electric  Disc  —  DSU  250 


•  Eight  disc  units  under  direction  of  two  controllers  would  be 
required  for  2  billion  characters  of  storage. 

•  Transfer  rate  is  220  kc. 

•  Each  disc  unit  has  16  independent  actuators  which  can  be 
commanded  to  seek.  Therefore,  the  system  can  have  128 
concurrent  searches. 

•  Average  access  time  with  no  overlap  is  90  ms.  "Second" 
accesses  will  be  less  than  half  of  this  .  Assuming  no  overlap 
searching,  an  eight -hour  load  will  require  3.8  hours  At  the 
other  extreme,  full  use  of  overlap  capability  would  allow  the 
load  to  be  handled  in  just  a  fraction  of  an  hour. 

3 . 6  Honeywell  1800  Disc 


•  Almost  2.5  billion  characters  of  storage  is  contained  in 
three  units  . 

•  Average  access  time  is  110  ms.  Therefore,  without  overlap 
of  any  kind  the  daily  request  load  would  take  4.5  hours  of 
accessing.  Detailed  information  was  not  available  ,  but 
utilizing  only  the  simultaneity  of  the  three  units  operating 
in  parallel,  one  hour  would  be  needed. 

4 .  File  Organization 

One  of  the  most  important  features  of  any  query  system  is  the  organiza¬ 
tion  of  the  files  .  Since  each  query  in  the  study  system  refers  to  a  unique  in¬ 
dividual,  the  file  is  ordered  by  a  serial  number.  This  serial  number  is  used 
as  a  primary  address  locator  by  revealing  what  disc,  card,  strip,  or  area 
should  be  searched.  A  table  lookup  in  core  could  be  performed  on  the  lead¬ 
ing  digits  of  the  number.  There  could  perhaps  be  a  secondary  address  loca¬ 
tor  on  each  disc,  card,  etc.  ,  which  could  be  examined  to  find  the  exact 
location  desired.  A  high-speed  drum  containing  a  complete  address  lookup 
could  also  be  used  as  a  locator.  The  Univac  FASTRAND  Drum  has  content 
search  instructions  which  could  be  employed  here  to  save  reading  an  address 
locator  into  core  and  finding  the  desired  address  . 

Storage  capacities  given  in  the  preceding  section  are  stated  in  6 -bit 
characters  (RCA  3488  has  7 -bit  characters).  It  should  be  noted  that  the  IBM 
equipment  (1302  Disc  and  2321  Data  Cell)  have  an  8-bit  mode  also  Double 
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numeric  digits  in  the  data  file  can  be  stored  in  one  8 -bit  character  as  op¬ 
posed  to  two  6-bit  characters  .  Any  numeric  data  coded  in  this  way  will 
occupy  one-fourth  less  space.  The  data  will  have  to  be  tagged  to  be  recog¬ 
nized,  but  since  this  system  is  not  processing  limited,  the  extra  processing 
will  not  be  detrimental.  This  form  of  packing  would  only  be  advantageous  if 
it  could  reduce  the  file  size  by  an  amount  equal  to  or  greater  than  a  storage  unit. 

5  .  Batch  Processing 

If  the  constraint  of  real-time  operation  were  relaxed,  some  of  the  de¬ 
vices  described  could  be  used  much  more  effectively.  If  all  the  incoming  re¬ 
quests  were  ordered  and  batched  and  stored  until  the  end  of  the  day,  the 
processor  would  be  free  for  the  majority  of  the  day  to  perform  other  tasks.  For 
example,  if  the  file  was  stored  on  the  RCA  3488,  one  complete  pass  could  be 
made  by  each  card  with  an  average  of  4 . 7  requested  records  per  card.  This 
would  reduce  the  entire  accessing  and  processing  time  to  about  one  hour 
(Best  case  with  continuous  processing  for  this  device  is  2.5  hours.)  There  is 
additional  time  to  be  added  to  this  hour  —  the  time  to  sort  or  order  the  queries 
by  serial  number  for  the  final  processing.  Queries  could  be  stored  on  a  drum 
in  such  a  manner  that  sorting  and  merging  would  not  be  necessary  as  it  would 
be  if  they  were  to  be  held  on  tape  until  final  processing. 

Cost  can  be  reduced  by  batching  because  it  would  only  be  necessary  to 
have  one  RCA  3488  unit  instead  of  three.  The  magazines  are  interchangeable 
so  that  when  all  processing  of  the  first  block  of  16  magazines  is  completed, 
the  next  block  can  be  loaded  in  its  place.  Also  the  cards  will  get  less  wear 
by  being  accessed  only  once  a  day.  Maintenance  is  thus  reduced. 

The  previous  also  applies  to  the  IBM  Data  Cell  which  has  removable 
blocks  of  magnetic  strips. 

It  must  be  kept  in  mind  that  with  batching  there  is  an  automatic  one- 
day  delay  (or  whatever  batching  time  is  chosen)  before  answers  are  received. 

If  requests  are  batched  and  disc  or  drum-type  storage  is  being  used,  the  ac¬ 
cessing  delay  time  will  be  much  closer  to  that  stated  for  best  case  for  these 
devices  (Figure  2)  than  could  be  achieved  without  batching. 

The  Summary  File  represents  a  compilation  of  statistics  which  is  not 
often  queried.  The  desirability  of  accessing  this  file  for  updating  more  than 
once  per  query  is  certainly  questionable  in  terms  of  the  low  number  of  queries 
addressed  to  it  The  updating  of  this  file  could  be  done  once  a  day,  or  at  the 
time  it  is  queried.  All  changes  would  be  saved,  accumulated,  and  applied  in 
one  pass  of  the  file.  If  this  batching  is  not  allowed,  and  it  is  determined 
that  there  are  many  Summary  File  accesses  per  query,  it  would  perhaps  be  ad¬ 
visable  to  put  this  file  on  a  high-speed  drum.  The  Summary  File  is  relatively 
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small  in  size  compared  to  the  master  file  and  it  could  be  effectively  updated 
in  parallel  with  the  query  processing.  This  would  of  course  reduce  the  access 
times  given  for  the  various  systems  by  almost  one -half  , 

6  Conclusions  and  Recommendations 

The  objective  of  this  study  was  to  show  feasibility  of  building  the  sort 
of  query  system  described.  This  has  been  demonstrated  in  terms  of  the  variety 
of  equipment  configurations  which  would  satisfy  the  system  requirements  . 
Economic  feasibility  remains  to  be  studied. 

Since  the  transfer  time  for  2,000  character  records  is  much  smaller  than 
the  access  time  in  the  study  system,  it  would  in  general  be  impossible  to 
utilize  more  than  the  minimum  number  of  channels.  No  time  will  be  saved  by 
adding  additional  capability  in  this  area. 

It  will  be  possible  to  interleave  other  processing  tasks  which  do  not  re¬ 
quire  bulk  storage  access  with  the  continuous  query  handling.  This  will  in¬ 
crease  the  productive  output  of  the  system  and  therefore  its  efficiency. 

Referring  to  Figure  2,  the  percentage  growth  listed  for  each  device  is 
calculated  in  terms  of  additional  request  load;  it  is  then  really  a  reflection  of 
the  number  of  bulk  storage  accesses  which  can  be  tolerated  in  an  eight-hour 
period  This  growth  capability  shown  on  the  figure  relaxes  some  of  the  limita¬ 
tions  of  the  assumptions  made  For  example,  suppose  it  is  determined  that  an 
extra  access  per  query  must  be  made  in  order  to  locate  the  record  desired 
This  will  add  to  access  time,  but  not  enough  to  place  any  of  the  devices  in  a 
marginal  performance  area.  Similarly,  more  accesses  to  the  Summary  File  per 
query  could  be  tolerated 

In  order  to  make  the  final  decisions  of  system  configuration,  cost  must 
be  applied  and  the  other  uses  of  the  system  must  be  defined.  If  the  equip¬ 
ment  is  to  serve  no  other  needs,  then  the  least  expensive  system  which  can 
possibly  provide  response  to  the  query  load  expected  should  be  chosen.  If, 
however,  there  are  other  tasks  to  be  performed,  capability  greater  than  needed 
just  for  the  query  handling  will  be  necessary.  In  the  extreme,  if  the  query 
handling  is  to  be  of  only  secondary  importance  in  the  system,  batching  of  re¬ 
quests  should  be  considered.  Since  there  is  a  wide  range  of  equipment  avail¬ 
able,  it  will  be  possible  to  match  closely  the  functions  of  the  system. 
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SECTION  III 


STUDY  OF  AIR  FORCE  PERSONNEL  ASSIGNMENT  SYSTEM 


1 .  Introduction 


This  section  is  devoted  to  the  study  of  an  Air  Force  personnel  assign¬ 
ment  system,  the  goals  of  which  are  to  identify  potential  problem  areas  in 
a  full-scale  system  and  to  suggest  guidelines  for  system  design  which  are 
valid  for  most  large-scale  computer  systems.  Concentration  is  focused  on 
two  areas  —  file  organization  and  computational  requirements . 


The  central  organizational  concept,  on  which  the  proposed  automated 
system  is  based,  is  to  have  comprehensive  career  records  of  Air  Force  per¬ 
sonnel^  and  codified  job  descriptions  to  which  the  individuals'  qualifications 
may  be  matched.  Updated  man  and  job  files  would  be  continually  available 
for  the  assignment  exercise  which,  when  approved,  would  be  entered  into  the 
permanent  files  as  ordinary  updating.  Because  of  the  magnitude  of  the  file 
size  (1  billion  characters)  and  the  assignment  load  (6000  individuals  per 
week) ,  file  organization  is  of  central  importance. 


The  value  of  a  full-scale  system  will  be  measured  in  terms  of  the  de¬ 
gree  to  which  it  provides  proper  assignments  ,  the  time  to  accomplish  this  , 
and  the  cost  to  perform  the  task.  The  present  study  has  not  been  directed  at 
the  effectiveness  of  assignments.  However,  some  assignment  algorithms 
are  discussed  in  paragraph  4  of  this  section. 


2  System  Design  Requirements 


The  Air  Force  man  file  is  composed  of  130  ,000  officer  records  and 
720,000  airman  records .  The  job  file  contains  850,000  records;  each  de¬ 
scribes  a  unique  job,  e.g.  ,  weather  officer,  flight  surgeon,  mechanic,  etc. 
Referring  to  Figure  3,  the  job  file  is  searched  on  a  regular  basis  and  all 
open  job  records  are  extracted.  Then  the  man  file  is  searched  to  identify 
personnel  available  to  assume  the  open  jobs.  These  available  jobs  and  men 
are  then  matched  by  qualifications  and  a  matrix  is  constructed  which  shows 
the  value  of  each  man  to  each  job.  An  optimization  technique  is  then  ap¬ 
plied  to  the  matrix  to  obtain  an  assignment  of  men  to  jobs.  Subsequent  up¬ 
dates  reflect  the  assignments. 

The  man-job  matrix  size  is  a  function  of  the  cycle  time  (time  since  a 
particular  job  category  was  last  searched  for  openings  and  assigned  new 


2.  All  personnel  levels  except  E8,  E9  ,  Colonel,  and  General. 
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Figure  3.  Recycling  Available  Men  to  Open  Jobs 


13 


personnel).  An  individual's  cycle  time  (time  between  new  assignments)  is 
assumed  to  be  three  years.  Therefore,  a  third  of  all  personnel  in  any  job 
category  are  reassigned  each  year.  Since  personnel  are  not  distributed 
evenly  among  the  job  categories  ,  it  will  be  necessary  to  recycle  different 
job  categories  at  different  rates  and  with  varying  matrix  sizes. 

Processing  required  by  an  optimization  procedure  considered  in  a  pro¬ 
totype  system  at  ESD  was  to  be  approximately  proportional  to  the  cube  of 
the  matrix  size  (See  Decision  Index  Method,  paragraph  4.)  If  the  cycle 
time  is  long  and  the  matrix  large,  the  additional  processing  time  may  be 
prohibitive  However,  the  optimal  matrix  size  with  respect  to  processing 
time  may  result  in  a  matrix  so  small  that  an  individual  is  never  considered 
for  a  large  percentage  of  potential  jobs. 

Matrix  size  processing  time,  and  cycle  time  are  three  interrelated 
parameters  which  can  be  adjusted  within  limits  to  influence  system  per¬ 
formance.  These  relationships  exist  independent  of  the  configuration  of  the 
system  and  therefore  do  not  have  to  be  defined  in  the  early  design  stages. 

3  .  System  Configuration 

Four  alternative  configurations  are  presented  with  their  relative  ad¬ 
vantages.  There  exists  another  Air  Force  system  which  contains  a  master 
file  of  all  personnel  data  with  on-line  updating.  It  was  suggested  by  ESD 
that  the  system  under  study  communicate  with  the  master  file  system.  The 
degree  to  which  this  supplied  link  is  utilized  represents  the  difference  be¬ 
tween  the  configurations  described  below. 

The  responsibility  of  updating  the  job  file  falls  to  the  man/job  system 
in  all  configurations,  but  it  is  assumed  that  this  file  is  altered  infrequently. 
One  entry,  social  security  number^  of  incumbent,  intended  to  assure  that 
one  is  not  recycled  to  his  own  job,  could  be  updated  when  the  entire  job 
category  assignment  has  been  approved.  All  changes  can  be  made  to  the 
job  category  tape  file  at  one  time. 

3 , 1  Configuration  A 

At  one  extreme  of  the  scale  suggested  above  lies  the  case  where  the 
man/job  assignments  are  executed  by  a  computer  system  which  has  no  other 
tasks  and  which  is  self-sufficient  in  the  sense  that  no  data  or  programs  are 
shared  with  any  other  system.  It  relies  on  the  master  file  system  only  for 
the  transmission  of  updating  information,  as  seen  in  Figure  4.  Whenever 


3.  This  assumes  that  all  personnel  have  applied  for  and  possess  social 
security  numbers. 
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Figure  4  .  Configuration  A 


the  master  file  system  receives  a  file  change,  it  is  formatted  and  sent  to  the 
man/job  system  for  incorporation  into  its  files.  The  man/job  system  has  no 
real-time  requirements;  it  is  only  necessary  that  files  be  currently  correct 
when  they  are  being  searched. 

Since  the  files  would  not  be  shared,  it  would  be  feasible  to  organize 
them  in  the  most  advantageous  manner  with  regard  to  the  assignment  problem. 
Different  cycles  or  job  categories  are  completely  independent  so  that  each 
one  could  act  as  a  separate  file.  If  the  files  were  ordered  according  to  job 
category  number,  it  would  be  possible  to  deal  only  with  relatively  small  par¬ 
cels  of  information,  and  the  total  file  size  need  be  considered  only  for  stor¬ 
age,  not  for  processing.  Separation  of  the  file  into  categories  would  also 
facilitate  additions  and  deletions.  Further  search  efficiency  would  result 
from  organizing  the  man  file  by  social  security  number,  or  by  ordering  both 
man  and  job  files  by  availability  dates  (e.g.  ,  vacancy  date).  This  would 
minimize  the  number  of  man/job  records  which  would  have  to  be  examined  in 
the  execution  of  a  particular  assignment  task. 

If  the  assignment  matrix  were  limited  to  100  jobs  (men)  at  one  time, 
there  would  have  to  be  at  least  60  assignment  runs  per  week  to  accommodate 
the  6000  weekly  assignments.  The  entire  data  base  would  occupy  about  60 
tapes  ,  but  if  the  file  is  structured,  only  appropriate  ones  would  have  to  be 
mounted  at  one  time.  It  would  be  advantageous  to  use  more  than  the  required 
number  of  tapes  for  storage  so  that  a  given  category  file  could  be  found  more 
quickly  and  less  extraneous  data  would  have  to  be  read  in  searching  for  the 
appropriate  file.  This  measure  would  also  decrease  the  amount  of  information 
to  be  copied  when  writing  a  new  tape  with  corrections . 

The  major  obstacle  encountered  in  this  configuration  is  updating  the 
data  base.  The  update  changes  will  have  to  be  accumulated  and  incorporated 
into  the  data  file  before  it  can  be  searched  for  a  reassignment  cycle.  If  each 
individual's  record  is  changed  once  a  month,  there  will  be  approximately 
42,000  updates  per  weekday.  Assuming  that  the  master  file  system  is  supply¬ 
ing  them  over  one  standard  telephone  line  (2  kc  character  transfer  rate) ,  and 
assuming  that  the  average  update  length  is  80  characters  long,  it  will  require 
six  hours  of  transmitting  to  convey  the  daily  updates  to  the  man/job  system. 
An  additional  channel  exclusively  for  this  update  link  may  be  needed  by  the 
master  file  system.  The  master  file  is  organized  by  social  security  number, 
whereas  the  man/job  system's  files  are  ordered  by  job  category  number,  and 
perhaps  further  by  availability  date.  Therefore,  all  incoming  updates  must 
be  cross  referenced  and  sorted  before  they  can  be  incorporated  into  the  tape 
files.  The  most  efficient  way  of  sorting  such  a  vast  amount  of  information  is 
by  sorting  cards  off-line  as  discussed  below.  The  update  message  could  be 
punched  on  a  card  as  it  arrives  and  then  the  cards  could  be  sorted  by  social 
security  number.  These  cards  would  then  be  matched  against  a  cross- 
reference  file  of  15  million  characters  to  obtain  the  proper  category  location 
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for  each  one.  Then  new  cards  would  be  punched  with  the  new  information  and 
sorted  by  category;  they  would  then  be  merged  with  the  appropriate  tape  files  , 

The  scheme  just  mentioned  has  operational  drawbacks;  the  double  set 
of  cards  would  number  84,000  per  day.  The  sorting  alone  would  require  25 
hours  a  day  necessitating  the  use  of  full  shifts  of  two  parallel  sorting 
machines  and  personnel.  This  off-line  processing  load  can  be  halved  by  re¬ 
quiring  the  master  file  system  to  send  all  update  information  coded  by  job 
category  number  (dotted  path  in  Figure  4) .  In  all  probability  this  will  require 
an  extension  to  the  personnel  files  in  the  master  file  system.  The  master 
file  records  are  each  just  under  2,000  characters  so  that  the  addition  of  a  9- 
digit  number  (if  not  currently  in  each  individual's  record)  does  not  constitute 
a  large  addition.  If  this  arrangement  could  be  made,  the  incoming  updates 
could  be  punched  on  cards  to  be  sorted  by  job  category  number. 

It  would  be  inefficient  to  update  files  which  were  not  being  examined 
if  for  no  other  reason  than  that  the  more  than  60  tapes  holding  the  file  would 
have  to  be  mounted  and  rewritten  with  the  update  information.  An  alternative 
is  to  save  all  cards  pertaining  to  a  job  category  until  it  is  due  to  be  recycled 
so  that  all  accumulated  updates  could  be  incorporated  in  one  pass  of  the  tape. 
However,  aside  from  the  obvious  card  storage  problem,  there  would  be  no  way 
of  establishing  historical  significance  of  any  of  the  update  information. 

The  configuration  described  is  unwieldy  because  of  its  dependence  on 
operating  personnel.  There  are  several  kinds  of  errors  to  which  this  system 
will  be  prone.  Sorting  mistakes,  where  a  card  is  out  of  order  or  where  a 
card  has  been  sorted  into  the  wrong  category,  can  usually  be  corrected  man¬ 
ually  by  simply  examining  the  rejected  card.  The  errors  which  are  more  diffi¬ 
cult  to  deal  with  are  garbling  and  loss.  In  these  cases  it  is  impossible  to  re¬ 
cover  the  correct  update  from  the  card,  and  it  will  be  necessary  to  have  some 
other  method  of  keeping  track  of  the  updates.  One  solution  would  be  to  write 
a  transaction  tape  as  the  incoming  update  cards  were  being  punched;  this 
could  be  used  for  recovery  if  the  file  updating  was  performed  frequently. 
Otherwise  it  would  be  impossible  to  know  which  tape  would  have  the  correct 
update  being  searched  for. 

If  the  master  file  system  had  access  to  each  person's  job  category 
number,  the  master  file  could  be  requested  to  send  an  entirely  new  data  base 
at  regular  intervals.  This  would  result  in  a  continually  evolving  data  base. 

It  would  require  about  240  hours  of  transmitting  to  convey  the  entire  man  file 
(2000  characters  each)  from  the  master  file  system  to  the  man/job  system. 

3  2  Configuration  B 


An  alternative  design  is  the  same  as  shown  in  Figure  4  except  that  the 
billion-character  man  and  job  files  are  stored  on  a  large  disc  rather  than  on 
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tape  The  master  file  system  would  be  utilized  in  the  same  manner  as  in 
Configuration  A  —  it  would  supply  updating  information.  Sorting  personnel 
and  machines  would  no  longer  be  necessary,  but  the  computer  system 
would  be  considerably  more  complex 

If  updating  information  were  sent  coded  by  job  category,  the  disc  file 
could  be  immediately  corrected.  However,  if  the  master  file  did  not  con¬ 
tain  this  information,  the  updates  could  be  sent  by  social  security  number 
and  written  on  tape.  A  job  category  number-social  security  numDer  cross- 
reference  file  could  be  held  on  the  disc  and  accessed  to  determine  which 
records  should  be  changed  in  the  file.  Based  on  an  average  disc  access 
time  of  100  msec.  ,  it  would  require  1.16  hours  a  day  to  consult  the  cross- 
reference  file;  this  is  the  additional  updating  time  needed  if  the  job  cate¬ 
gory  number  does  not  accompany  an  update. 

It  has  only  been  recently  that  random -acces  s  storage  of  the  magnitude 
required  by  the  man/job  file  has  been  available  Storage  equipment  of  this 
size  requires  fairly  expensive,  fast,  and  large  central  processors. ^  Incre¬ 
mental  additions  in  storage  are  much  less  expensive  per  character  than  the 
original  bulk  needed.  For  this  reason  there  is  no  advantage  to  having  the 
cross-reference  file,  if  necessary,  on  tape  —  it  should  be  kept  on  the  disc 
so  that  the  random -acces s  feature  can  be  used  If  this  configuration  is  im¬ 
plemented,  the  system  will  have  more  capability  than  is  necessary  to  perform 
the  man/job  assignments.  Unless  there  is  other  processing  which  could  be 
done  on  this  system,  the  efficiency  of  this  configuration  may  be  very  low. 

3 . 3  Configuration  C 

Shifting  more  of  the  responsibility  onto  the  master  file  system  would 
allow  the  man/job  system  to  concern  itself  chiefly  with  processing  the  as¬ 
signment  matrix.  In  this  case,  the  man/job  system  would  function  as  a  re¬ 
mote  console  communicating  with  the  master  file  system  for  data  but  perform¬ 
ing  its  own  processing.  In  review,  the  master  file  system  has  a  2000- 
character  record  for  each  member  of  the  Air  Force  ordered  by  social  security 
number  and  the  capability  for  on-line  updating  of  its  own  file.  The  master 
file  would  be  relied  upon  to  supply  individual  man  file  records  on  demand  as 
each  category  is  recycled  as  shown  in  Figure  5.  A  cross-reference  file 
could  be  constructed  which  would  be  ordered  by  job  category  number.  This 
tape  would  be  searched  for  the  appropriate  category,  and  the  successive 
social  security  numbers  of  personnel  in  this  job  category  would  be  sent  to 
the  master  file  system  for  direct  accessing  from  its  storage.  The  individual 
records  would  then  be  transmitted  to  the  man/job  system  where  they  would 


4  See  Section  II. 3  for  a  survey  of  available  storage  equipment  in  the  size 
range  required. 
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be  written  on  tape.  This  subfile  tape  would  then  be  used  for  processing  of 
the  assignment  matrix.  As  in  ANY  configuration,  the  approved  assignments 
must  be  sent  back  to  the  master  file  system  as  a  normal  update. 

An  advantage  of  this  configuration  is  that  the  man/job  system  is  re¬ 
lieved  of  all  update  processing  If,  as  is  suggested  here,  the  master  file 
system  were  to  transmit  only  the  relevant  man  file  records  over  one  standard 
(2  kc  character  transfer  rate)  telephone  link,  an  entire  week's  assignment 
load  (6000  records  of  2000  characters  each)  could  be  transmitted  in  one  and 
two-thirds  hours  With  the  removal  of  update  loads  the  need  for  sorting 
machines  and  personnel  vanishes 

A  possible  drawback  is  that  the  master  file  man  record  will  probably 
have  to  be  extended  in  order  to  cover  all  information  required  for  the  match¬ 
ing  of  qualifications  for  assignment.  The  prototype  man/job  system  contains 
867 -character  man  records;  some  of  these  characters  will  definitely  be  re¬ 
dundant,  but  not  all. 

3 . 4  Configuration  D 

The  logical  extreme  of  the  utilization  of  the  postulated  master  file  sys¬ 
tem  is  to  allot  the  entire  assignment  task  to  it.  Incremental  storage  additions 
to  its  2  billion-character  file  will  be  relatively  inexpensive  per  character  and 
the  tasks  presently  envisioned  for  the  system^  allow  assumption  of  additional 
processing  loads.  This  configuration  makes  the  man/job  system  look  like  a 
remote  console  with  only  input/output  capability. 

4 .  Assignment  Algorithm 


The  assignment  problem  can  be  stated  as  follows:  given  n  people  and 
n  jobs  available  for  assignment,  determine  the  "best  assignment"  such  that 
only  one  person  is  assigned  to  one  job.  "Best  assignment"  refers  to  some 
maximization  of  the  payoff  values  assigned  to  each  man/job  combination. 
Three  such  optimization  techniques  were  investigated  briefly  and  are  listed 
below . 


|  11]  6 

The  Decision  Index  Method  consists  of  successive  modifications 

of  a  man/job  payoff  matrix.  The  first  step  is  to  compute  the  Decision  Index 
for  each  element  of  the  matrix  and  then  select  the  element  with  the  highest 
Decision  Index  as  the  first  assignment.  That  row  and  column  are  then  deleted 
from  the  original  value  matrix  and  a  new  set  of  indexes  is  calculated.  The 
highest  index  is  again  chosen  as  the  assignment.  The  procedure  is  continued 


5  ,  See  Section  II .  3  „ 

6  Numbers  in  brackets  designate  references  included  at  the  end  of  this 
report . 
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until  all  personnel  or  jobs  have  been  assigned  Personnel  assignment  re¬ 
sulting  from  this  technique  is  not  optimal;  however,  the  logic  and  computa¬ 
tions  are  relatively  simple.  To  obtain  a  timing  estimate,  a  100  x  100  matrix 
is  chosen  as  an  example  The  first  assignment  requires  n^  =  10,000  Deci¬ 
sion  Index  computations  and  10,000  comparisons  to  select  the  maximum  value. 
To  compute  the  next  assignment  there  are  (n  -  1  )2  computations  and  compari¬ 
son^.  The  total  number  is  the  sum  of  the  squares  =  n^  +  (n-l)^  +  „  ,  .  +  3^ 

+  2  which  is  asymptotic  to  n^/3  for  large  n.  For  the  100  x  100  example, 
there  are  376,699  computations  and  an  equal  number  of  comparisons  to  be 
made  Depending  on  the  speed  of  the  processor  chosen,  time  can  be  esti¬ 
mated;  conservatively  estimating  1  msec  for  each  operation,  the  entire  matrix 
could  be  solved  in  about  12  minutes.  This  technique  requires  a  working  stor¬ 
age  of  2n^  locations  (20,000  for  100  x  100  size  matrix). 

[  12] 

The  Hungarian  Method  is  a  way  of  manipulating  a  value  matrix  to 
yield  a  solution  which  is  optimal  in  the  sense  that  the  payoff  of  the  assign¬ 
ment  matrix  is  truly  maximized.  Because  the  calculations  are  simple  and 
there  are  fewer  of  them  than  in  the  previous  example,  it  is  assumed  that  this 
technique  would  require  less  processing  time. 

The  personnel  assignment  problem  can  also  be  formulated  as  a  classical 
linear  programming  problem  —  the  transportation  problem  t  It  has  been 
studied  extensively  and  optimal  solutions  exist. 

5  Design  Alternatives 

The  configuration  of  the  proposed  man/job  matching  system  must  be 
viewed  within  the  context  of  shared  responsibilities  with  the  master  file  The 
respective  delegation  of  duties  hinges  on  the  degree  to  which  the  study  sys¬ 
tem  is  balanced  in  size  and  speed  to  the  master  file  system  If,  at  one  ex¬ 
treme,  the  man/job  system  is  designed  to  have  its  own  data  base  storage, 
handle  its  own  processing,  and  rely  on  the  master  file  system  only  for  updat¬ 
ing  information,  it  becomes  an  independently  scheduled  and  autonomous  com¬ 
puter  system.  At  the  other  end  of  the  spectrum,  the  man/job  system  is  a 
console;  in  this  case  its  limited  capability  should  be  fully  utilized  in  per¬ 
forming  the  assignments  and  answering  queries  ,  or  assuming  any  other  new 
functions  Since  it  is  so  dependent  on  the  master  file  system  for  data  and 
updating,  the  console  should  do  all  the  processing  possible  so  as  to  be  a 
minimum  burden  to  the  larger  system  In  other  words  ,  the  more  balanced  the 
two  systems  are,  the  more  the  total  load  should  be  distributed.  If  the  systems 
have  vastly  different  capabilities,  the  smaller  one  should  be  fully  utilized, 
leaving  the  larger  system  more  flexible. 

The  characteristics  to  be  considered  in  choosing  a  system  configuration 
have  both  quantitative  and  qualitative  trade-offs.  The  salient  features  are 
total  data  storage  and  transmission  equipment  required,  resource  utilization, 


21 


and  cost.  Attention  must  be  paid  to  the  capacity  for  expansion;  often 
growth  is  not  in  predictable  directions.  The  original  designers  of  the  master 
file  system  discussed  here  may  not  have  envisioned  the  man/job  match 
function  which  the  system  will  serve,  A  rigid  configuration  like  A  allows 
very  little  expansion  in  size  and  almost  no  alteration  in  the  role  of  the  sys¬ 
tem.  On  the  other  hand,  if  the  man/job  system  looks  more  like  a  console, 
time-sharing  its  tasks  with  a  larger  system,  there  are  immediate  advan¬ 
tages  in  flexibility  and  the  ability  to  adapt  the  system  to  new  demands  For 
example,  there  may  be  future  need  for  querying  of  the  master  file  system  via 
the  consoles.  As  another  example,  the  console  might  serve  as  a  statistics 
gathering  center. 

6 ,  Recommendations 


Referring  to  the  Summary  Chart  (Figure  6) ,  Configuration  D  or  C  is  rec¬ 
ommended  because  of  its  capacity  for  growth  in  any  direction,  its  freedom 
from  the  update  load  and  its  independent  operational  procedures.  This  de¬ 
sign  also  represents  more  efficient  use  of  resources  in  that  the  man  files  do 
not  have  to  be  maintained  redundantly,  and  the  additional  updating  traffic 
is  avoided . 
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Based  on  3 -year  duty  cycles, 
and  one  update/month/man. 


SECTION  IV 


STUDY  OF  A  TIME -SHARED  PERSONNEL  SYSTEM 


1 ,  Introduction 


This  section  will  describe  an  analytical  study  of  some  of  the  perform¬ 
ance  and  design  characteristics  of  an  on-line,  time-shared  personnel  system 
which  could  be  implemented  using  presently  available  equipment. 

The  original  intent  of  the  study  was  to  compare  the  performance  and 
cost  of  two  general  tvpes  of  implementation  of  such  a  system.  Both  types 
were  characterized  by  a  centralized  data  storage  facility  which  could  be  ac¬ 
cessed  and  modified  by  a  large  number  of  geographically  dispersed  users 
who  communicate  with  the  system  in  an  on-line  fashion.  The  first  type  of 
system  design  provided  for  centralized  computation  as  well  as  data  storage, 
with  no  computational  capability  at  the  remote  terminals;  the  second  type 
provided  for  centralized  data  storage  and  retrieval,  but  computational  capa¬ 
bility  to  operate  on  retrieved  data  was  dispersed  to  the  remote  sites. 

As  the  performance  data  was  determined  for  the  applications  in  question, 
the  clear  superiority  of  the  first  type  of  system,  that  of  centralized  computa¬ 
tion  as  well  as  data  storage  was  made  quite  evident  and  consequently  em¬ 
phasis  was  placed  on  this  type.  Paragraph  3  of  this  section  will  discuss  the 
reasoning  to  justify  this  decision. 

2  .  Centralized  Computation  System 

The  characteristics  of  the  hypothetical  system  studied  with  this  model 
are  not  peculiar  to  personnel  management  systems  ,  but  are  applicable  to 
many  other  types  of  command  and  control  functions.  Principally,  these  char¬ 
acteristics  are;  multiple  simultaneous  users;  large,  structured,  random- 
access  data  base  (billions  of  characters);  high  update  load;  priority 
schemes;  and  low  proportion  of  computational  load  compared  with  data  base 
maintenance  load.  The  bulk  data  storage  is  assumed  to  be  provided  by  disc 
or  drum  units  . 

2 . 1  Assumed  System  Functions 

As  applied  to  the  personnel  managemert  function,  the  system  would 
provide  a  centralized  facility  to  serve  both  personnel  and  payroll  users.  The 
functions  of  the  system  can  be  categorized  fc  oadly  as:  (1)  Man/Job  assign¬ 
ments;  (2)  Updates  to  Centralized  Data  Bast'  and  (3)  Queries  of  Data  Base. 
The  man/job  assignments  involve  many  sepai-.to  steps.  First,  the  personnel 
and  job  files  must  be  searched  for  available  men.  and  jobs.  These  will  be 
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ordered  in  matrix  form,  to  which  optimization  methods  can  be  applied.  This 
last  step  represents  the  only  significant  computational  load  considered  for 
the  system,  and  it  will  be  assumed  that  such  computations  will  be  performed 
as  a  "background"  task  for  the  processor  while  it  is  waiting  for  inputs  from 
consoles  or  for  data  to  be  retrieved  from  the  bulk  storage  unit.  Updates  re¬ 
quire  no  significant  computer  processing,  but  do  require  two  accesses  of  the 
bulk  storage  to  find  and  correct  the  update  item.  Queries  represent  multiple 
retrievals  from  the  disc  unit,  with  a  moderate  amount  of  computer  processing 
applied  to  each  retrieval,  using  Boolean  comparisons  of  the  retrieved  records 
with  the  request  formulation, 

2 . 2  Description  of  Model  Studied 

A  functional  model  of  the  centralized  system  is  shown  in  Figure  7.  The 
shapes  of  the  functional  blocks  correspond  to  the  conventions  of  the  GE  Sys¬ 
tem  Modeling  and  Simulation  Technique.  ^  Circular  shapes  represent  origina¬ 
tions  in  the  data  flow;  oval  shapes  represent  temporary  storage  or  buffering; 
rectangular  shapes  represent  processing  facilities  ,  with  consequent  delays; 
fan-in  and  fan-out  shapes  represent  merging  and  routing  functions;  and 
trapezoidal  shapes  represent  terminations  for  data  flow. 

2 . 3  Performance  Characteristics  to  be  Estimated 

The  performance  characteristics  of  interest  are  as  follows: 

•  Average  response  time  for  normal  users  and  priority  users; 

•  Statistics  of  utilization  factors  for  central  processor  and 
disc  access  channels;  and 

•  Statistics  of  queue  lengths  at  buffers  . 

2 . 4  System  Design  and  Load  Factors 

Values  for  the  above  performance  characteristics  were  estimated  by 
performing  simulation  runs  on  the  model,  with  the  following  as  design  con¬ 
stants  ,  or  variable  parameters: 

•  Arrival  Load  for  Queries  ,  treating  all  users  as  a  single 
composite  input  load. 


7,  For  a  description  of  this  technique,  see  ESD  TDR-63-612. 


25 


CONSOLE  1 


W 

0 

> 

0 


0 


cn 

TJ 

1-4 

o 

O 

0 


26 


Figure  7 .  GPSS  Flow  Chart 


•  Percentage  of  Query  Types  Retrievals  ,  Updates,  and  Com¬ 
putation  Requests® 

•  Number  of  disc  accesses  required  to  satisfy  retrieval  queries 
(influences  the  number  of  computer/disc  interchanges 
necessary) 

•  Amount  of  data  transferred  from  disc  to  CPU 

•  Number  of  access  channels  between  CPU  and  disc 

•  Number  of  search  arms  on  disc  (determines  number  of  simul¬ 
taneous  accesses) 

•  CPU  processing  time  for  each  query  type 

•  Disc  search  time  per  access 

•  Disc  to  CPU  transfer  time  as  function  of  data  block  length 
2 . 5  Assumptions  on  System  Design 

In  performing  the  simulation  study,  a  hypothetical  design  was  assumed 
having  the  following  characteristics: 

•  Individual,  non-s hared  communication  channels  between  con¬ 
soles  and  CPU; 

•  No  delays  in  transfer  of  query  to  CPU; 

•  Queues  for  computer  service  are  ordered  FIFO;  but 

•  Retrieval -type  queries  are  allowed  to  interrupt  service  on  a 
computation; 

•  When  CPU  traps  to  an  interrupt,  results  of  the  computations 
in  process  are  not  lost. 

•  All  data  and  programs  are  stored  randomly  on  a  random-access 
disc . 


80  In  the  actual  simulation,  only  retrievals  and  updates  were  considered  as 
query  types;  the  capability  for  computations  can  be  estimated  by  examin¬ 
ing  the  available  CPU  time  not  used  for  these  types  ,  and  considering  that 
background  computation  can  be  carried  out  during  these  intervals. 
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©  Priority  schemes  not  based  on  the  originator  of  the  query, 
except  for  the  special  takeover  type,  in  which  the  extra¬ 
ordinary  priority  might  be  assigned  through  the  importance 
of  the  requestor. 

In  addition,  for  purposes  of  the  simulation,  the  following  computational 
assumptions  were  made: 

•  Interval  between  arrivals  of  queries  from  consoles  is  ex¬ 
ponentially  distributed; 

•  Delay  at  the  disc  while  transferring  data  to  the  CPU  in  re¬ 
sponse  to  retrieval  requests  is  uniformly  distributed; 

•  Time  to  transfer  records  in  response  to  update  requests  is 
constant; 

•  Number  of  accesses  of  disc  to  satisfy  retrieval  requests  is 
uniformly  distributed  with  mean  of  7,  and  required  CPU 
processing  time  resulting  from  each  access  is  uniformly 
distributed  with  a  mean  of  10 ,000  p.sec. 

•  Times  for  executive  housekeeping  and  initiation  of  I/O  are 
included  in  CPU  processing  time  distribution  for  each  loop. 

•  No  errors  or  failures  take  place. 

•  Updates  require  only  two  disc  accesses,  and  essentially 
no  processing  time. 

2 . 6  GPSS  Simulation 

The  model  described  in  paragraph  2. 2  of  this  section  was  rendered 
into  the  form  required  for  computer  simulation  by  the  GPSS  III  (General  Pur¬ 
pose  System  Simulator)  program  for  the  IBM  7094.  The  rendition  is  organ¬ 
ized  around  the  sequence  of  steps  given  in  Figure  8  and  results  in  the  flow 
chart  of  Figure  9  .  The  computer  input  listing  corresponding  to  this  flow 
chart  is  given  as  Figure  10 

2 . 7  Simulation  Results 


A  limited  numoer  of  simulation  runs  were  made,  to  estimate  per¬ 
formance  for  a  nominal  system,  and  determine  instances  under  which  sat¬ 
uration  would  result  The  results  from  three  runs  are  of  most  interest. 
These  runs  are  summarized  as  follows: 
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GPSS  III  ASSEMBLY  INPUT 

PAGE  1 
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Figure  10.  GPSS  III  Assembly  Input  for  Run  3 
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Run  L  Lightly  loaded  system  limited  disc  access  equipment; 

Run  2;  Same  disc  access  equipment,  heavier  load,  saturated 
condition;  and 

Run  3;  Heavy  load,  added  access  capability,  improved  per¬ 
formance  , 

Table  1  gives  a  summary  of  these  results  ,  together  with  a  listing  of  the 
values  of  the  design  and  load  factors  for  each  run.  It  is  not  claimed  that 
these  runs  examine  the  performance  of  a  spectrum  of  possible  designs;  they 
were  conducted  to  illustrate  the  magnitude  of  loads  that  could  be  handled 
with  currently  available  equipment  configurations.  The  conclusions  which 
can  be  drawn  from  these  results  are  presented  in  the  following  paragraphs. 

2 . 8  Discussion  of  Results 


The  results  of  most  significant  interest  pertain  to  the  response  time 
and  utilization  columns  .  The  response  time  results  show  the  relatively  mar¬ 
ginal  value  of  incorporating  elaborate  priority  schemes  for  expediting  access 
to  the  CPU  by  special  users  For  the  system  studied,  and  any  similar  sys¬ 
tem  requiring  sharing  of  both  a  CPU  and  data  bank,  the  processor  is  rarely 
fully  utilized  {as  may  be  seen  from  the  utilization  columns).  However,  it 
seems  to  be  common  in  the  design  of  such  systems  to  devote  much  attention 
to  multi-level  priority  access  schemes  for  CPU  service  while  in  fact  the  real 
bottleneck  appears  at.  the  access  to  the  data  bank.  Referring  to  Table  I,  it  is 
seen  that  the  average  service  time  for  both  normal  and  special  requests  ,  for 
Runs  1  and  3  where  no  saturation  was  experienced,  can  be  calculated  by 
multiplying  the  average  access  time  by  the  number  of  accesses  required  for 
the  requests  ,  For  Run  1 ,  the  average  data  retrieval  time  was  100  msec  (90 
for  access,  10  for  readout);  for  Run  3,  the  average  retrieval  time  was  (for 
R-type  requests)  130  msec.  For  R-type  requests,  which  predominate  two  to 
one  for  Run  1  and  three  to  one  for  Run  3  ,  the  corresponding  average  response 
times  would  be  .7  and  91  seconds,  assuming  no  waiting  in  queues.  Factor¬ 
ing  in  the  shorter  response  times  for  update-type  requests,  it  is  easy  to  see 
that  the  actual  average  response  times  of  .67  and  .84  seconds  for  the  two 
runs  represent,  almost  entirely  the  summation  of  access  delays  while  going 
through  the  necessary  calls  on  bulk  memory  to  satisfy  the  requests  ,  and  that 
little  delays  were  experienced  in  queues.  No  priority  algorithms  can  speed 
this  process,  since  the  speed  with  which  the  disc  arm  finds  the  proper  loca¬ 
tion  for  the  address  being  accessed  cannot  be  increased. 

The  conclusion  to  be  drawn  from  the  results  presented  is  that  in  a 
multi-user  time-shared  storage  and  retrieval  system,  using  contemporary 
equipment  configurations  the  speed  of  computation  is  far  out  of  proportion 
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to  the  speed  of  access  to  the  data  base.  Priority  algorithms  are  not  appli¬ 
cable,  since  they  can  only  be  applied  to  the  re-ordering  of  positions  in 
queues,  and  this  cannot  solve  the  irreducible  delays  in  data  access. 

Comparing  Run  2  with  Run  1 ,  it  is  observed  that  the  system  can  be 
driven  into  saturation  by  only  a  minor  change  in  load  and  delay  parameters  . 

Run  1  displays  characteristics  of  a  highly  underloaded  system;  buffers  are 
rarely  occupied  and  utilization  rates  are  quite  nominal.  The  difference  be¬ 
tween  Run  1  and  Run  2  is  a  33  percent  increase  in  input  rate  and  a  20  per¬ 
cent  increase  in  the  access  and  delay  time.  This  change  was  sufficient  to 
make  the  system  reach  saturation  after  only  12  seconds  of  simulated  opera¬ 
tion.  Based  on  this  admittedly  incomplete  simulation,  it  might  be  concluded 
that  systems  having  these  general  characteristics  saturate  when  their  input 
limit  threshold  is  exceeded  only  briefly. 

3  .  Decentralized  Computation  Model 

At  the  initiation  of  this  study,  it  had  been  planned  to  model  and  simu¬ 
late  two  alternative  implementations  for  personnel  data  management  systems  , 
each  of  which  would  offer  centralized  data  storage  to  permit  remote  access 
to  most  current  data ,  but  which  differed  in  the  placement  of  the  computation 
function.  A  system  providing  a  small,  fast  computer  at  each  remote  site, 
communicating  with  centralized  data  storage,  appeared  as  an  attractive  alter¬ 
native  to  the  system  concept  described  in  the  preceding  section.  However, 
investigation  of  the  power  of  the  centralized  computation  system  quickly  re¬ 
vealed  that  the  decentralized  computers  would  prove  far  too  expensive  to  be 
practical . 

The  system  as  described  in  the  preceding  section  could  easily  accommo¬ 
date  hundreds  of  simultaneous  users  ,  each  having  the  capability  to  perform 
substantial  computational  tasks  on  retrieved  data.  Its  cost,  including  com¬ 
munications,  has  been  estimated  in  the  range  of  $2,000,000  to  $3,000,000. 

Of  this  ,  only  about  15  percent  constitutes  the  cost  of  the  actual  CPU.  To 
disperse  this  computational  power  in  a  number  of  remote  locations  requires 
installation  of  remote  computers  ,  each  of  which  should  be  capable  of  execut¬ 
ing  the  same  computations  as  those  performed  centrally.  In  addition,  the 
communication  lines  would  have  to  be  given  additional  capacity,  since  the 
disc-computer  data  transfers  constitute  much  more  data  than  the  console- 
computer  data  transfers.  Small,  fast  computers  having  the  computational 
capability  assumed  for  the  centralized  system,  are  priced  in  the  $80,000 
range  without  bulk  memory.  Hence,  it  is  estimated  that  only  for  fewer  than 
6  simultaneous  users  would  the  decentralized  computation  alternative  be 
economically  justified,  and  the  added  communication  costs  might  make  the 
threshold  even  lower. 
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4  „  Extension  of  the  Model  to  Other  Systems 

While  the  system  model,  described  in  paragraph  2  of  this  section, 
was  constructed  and  tested  for  a  typical  personnel  data  management  appli¬ 
cation,  it  is  applicable  without  significant  change  to  a  large  number  of 
other  types  of  data  management  systems  ,  including  a  variety  of  command 
and  control  applications.  The  basic  functions  are  found  in  other  systems 
requiring  multiple-user,  on-line  access  to  a  large  data  base.  Such  require¬ 
ments  are  found  in  logistic,  air  traffic  control,  surveillance,  transportation 
management,  intelligence,  and  many  other  similar  systems.  A  modified 
form  of  the  same  model  could  be  very  useful  for  determining  the  capability 
of  alternative  processing  and  data  handling  subsystem  designs  for  imple¬ 
mentation  of  improved  versions  of  any  of  these  applications. 
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APPENDIX  I 

FILE  PARTITIONING  ANALYSIS 


Suppose  a  file  consisting  of  N  records  is  divided  into  M  subfiles. 

Let  the  subfiles  be  indexed  from  i  =  1  to  i  =  M.  Let  the  length  of  the  i-th 

subfile  be  L, .  Then 
1 

M 

S  L.  =  N 
i=l  1 

Let  us  suppose  that  searching  is  done  as  follows: 

A  batch  of  n  search  qualifiers  is  served  at  a  time.  The  search  is  done 
by  going  through  the  subfiles  in  order,  from  i  =  1  to  i  =  M.  For  each  sub¬ 
file,  it  is  possible  to  determine  whether  any  of  the  search  qualifiers  in  the 
batch  pertain  to  that  subfile.  If  none  do,  one  does  not  read  any  records  from 
that  subfile,  but  instead  goes  to  the  next  subfile.  If  some  search  qualifiers 
do  pertain  to  the  given  subfile,  one  reads  the  records  from  this  subfile 
serially,  until  one  has  found  all  the  pertinent  records,  and  then  one  goes  to 
the  next  subfile.  (The  remaining  records  of  the  given  subfile  are  unread.) 
Each  search  qualifier  describes  exactly  one  record  in  the  whole  file.  It  is 
assumed  that  the  n  search  qualifiers  of  a  given  batch  describe  n  records  in 
the  file,  i.e.  ,  no  two  qualifiers  of  a  given  batch  describe  the  same  record. 

It  is  further  assumed  that  any  record  of  the  file  is  as  likely  to  be  described 
by  an  incoming  search  qualifier  as  any  other  record. 

Let  us  define  X  as: 

X  =  the  number  of  records  of  the  file  which  are  read  during 
the  process  of  serving  a  batch  of  n  search  qualifiers,, 

It  is  desired  to  find 

E(X)  =  the  average  value  of  X. 

In  order  to  do  this,  let  us  define 
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X.  =  the  number  of  records  of  the  i-th  subfile  which  are  read 
1 

during  the  process  of  serving  a  batch  of  n  search 
qualifiers . 

Now  since 

M 

X  =  S  X. 

.  ,  i 


we  have 


M 

E(X)  =  S  E(X.) 

.  ,  i 


where  E(XJ  is  the  average  value  of  X, . 
ing  the  E(X.)  . 


(1) 


Thus  the  problem  reduces  to  find- 


In  order  to  compute  E(Xj  we  begin  by  computing  P(X^  <  -  k) ,  which 

is  the  probability  that  X.  <  L.  -k  where  k  is  any  integer  such  that 
0  <  k  <  L. .  To  do  this,  number  the  search  qualifiers  from  i  =  1  to  i  =  n, 

—  —  i 

and  let  us  define 


C. 

) 


=  the  event  that  the  j-th  search  qualifier  does  not  refer 
to  the  (L^  -  k  +  1)  -th  through  L^-th  records  of  the  i-th 
subfile . 


Then 


P(X.<L,-k)  =  PIC,  -C2  •  •  •  Cn) 

=  P(C ,)  •  P(C2|C,)  •••  PICJC,  •••cn.1) 


W-k  N-k-1  N-k-(n-l) 

N  '  N-l  "  N-(n-J) 

For  any  k  such  that  0  <  k  <  , 


(2) 
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P(X  =  L.  -  k)  =  P(X.  <  L.  -  k)  -  P(X.  <  L.  -  k  -  1) 
i  1  1  ’  1  ii 


N-k 

N 


N-k-(n-l)  _  N-(k+ 1)  #  >  >  N-(k+l)  -  (n-1) 
N-(n-l)  N  ’**  N-(n-l) 


N-k  _  #  .  N-k-(n-l) 
N  ’**  N-(n-l) 


1  - 


N-(k+  1)  -  (n-1) 

N-k 


N-k 

N 


N-k-(n-l) 

N-(n-l) 


1+  (n-1) 

N-k 


_n  .  N-k-1  .  .  .  N-k-(n-l) 
N  N-1  N-(n-l) 


(3) 


which  can  now  be  used  to  compute  E(XJ 


E(X.) 

1 


E(X.) 

i 


Li 

s 

(L.  -k) 

#  P(X, 

=  L  -k) 

k=0 

Li_1 

i 

1 

i 

S 

(L.  -k) 

•  P(X 

=  L.  -k) 

i — 1 

O  1 

n  v 

l 

i 

i 

l 

(L.  -k) 

l 

n 

r— H 
1 

M 

i 

IZ 

s 

k=0 

C 

N 

»  —■■■'  ■■■  e  • 

N-1 

Li- 

n. 

1 

N 

(N-1) 

•  •  •  (N 

k=0 

N-k-(n-l) 

N-(n-l) 


(4) 


(N-k-(n-l)) 


(5) 


In  order  to  evaluate  the  sum  (5) ,  let  us  put 


P  (k)  =  k  •  (k-1)  •••  (k-in-I)) 
n 


and  put 
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Proof. 


Q  (k)  =  k  •  (k-1)  •  •  •  (k-(n-l))  •  (k-n) 
n 


Q  (k)  =  P  (k)  *  (k-n) 
n  n 


(7) 


Lemma .  For  any  integers  A,  B  with  A  <  B, 


1 


B 

E  p  (k>  " 
k=A  n  n+ 1 


Q  (B+l)  -Q  (A) 
n  n 


(8) 


Q  (k+1)  -  Q  (k)  =  (k+1)  •  k 
n  n 


(k-(n-l))  -  k  •  (k-1)  •  •  •  (k-n) 


k  •  (k-1)  •••  (k-(n-l))  j(k+l)  -  (k-n) 
(n+  1)  •  k(k-l)  •  •  •  (k-(n-l)) 


(n+1)  *  P  (k). 
n 


Hence , 


B 

2  P  (k)  = 
k=A  n 


n+1 


B 

S 

k=A 


Q  (k+1)  -Q  (k)  \ 

.  n  n  i 


— —  Q  (B+  1)  -  Q  (A) 

n+ 1  n  n 


Q.E.D. 


In  order  to  use  the  lemma  in  evaluating  the  sum  in  (5)  we  write 


(L.  -k)  =  (L.  -N)  +  (N  -  k  ) 

l  i 


so  that  the  sum  in  (5)  may  be  written 
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L.-l 

1 

2  (L.-k)  (N-k-1)  •  •  •  (N-k-(n-l)) 

,  „  i 


L.-l 

i 
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1  k=0 
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+  —  ,  (N+1)  •  -N  •••  (N-(n-l)) 
n+  1 


L.-N  N.-L+l 

-1-  •  1  +  ^ - 


n 


n+1 


(N-L.)  •••  (N-L. -(n-l)) 

l  l 


L.-N 

x 

n 


N  •  (N-l)  •  •  •  (N-(n-l)) 


+  — | •  (N+1)  N  (N-(n-l)) 
n+  1 

+  -^hr  •  (N-Li>  •  •  •  <N'Li  ■  (n-1)) 
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Substituting  this  expression  for  the  sum  into  (5), 


E(X.)  =  (L.-N)  +  -77  •  (N+l) 

1  1  n+l 


N-L-n  N-L. 

i  _ 1 

n+l  '  N 


N-L.-(n-l) 

N-(n-l) 


E(X.) 

1 


L. 

1 


N-n 

n+  1 


N-L.-n  N-L. 

1  _  _ 1 

n+ 1  N 


N-L^-(n-l) 

N-(n-l) 


which  can  be  written  as: 


(9) 


(9') 


Putting 


(N-L)l  •  (N-n) 
1 


K  = 


N 


(N-L,  -n-1) 
1 


(10) 


we  then  have 


E(X.)  =  L.  -  -7-7  +  -77 
1  1  n+l  n+l 


(ID 


If  we  substitute  L  +  (N-L)  for  N  in  (11),  we  can  rewrite  (11)  as 


E(X.) 

1 


n 

n+l 


(L.  +  1)  - 


N-L.-n 

1 


n+l 


+ 


K 

n+l 


(in 
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Computation  for  small  n. 


If  n  is  small,  there  is  no  difficulty  in  making  the  computation  directly. 
For  instance,  if  N  =  1000,  L.  =  100,  and  n=  3,  then  substituting  directly 
into  (9): 


E(X.)  = 

l 

997 

100  -  + 

4 

897  900 

4  1000 

899 

999 

898 

998 

log  900 

=  2.9542425 

log  1000 

=  3.0000000 

log  899 

=  2.9537597 

log 

999 

=  2.9995655 

log  898 

=  2.9532763 

log 

998 

=  2.9991305 

log  897 

=  2.9527924 

log 

4 

=  .6020600 

11  .8140709 
9.6007560 

9.6007560 

arc  log 

(2.2133149) 

=  163.424 

E(X.)  = 

l 

100  -  249.25 

+  163.424 

= 

14 .174 

If  this  file  of  1000  records  consists  of  10  equal  subfiles,  each  of 
100  records,  then  by  (1): 


E(X) 


10  10 
2  E(X.)  =  2  14.174 

l  .  , 


10  x  14.174  =  141.74 


An  inequality  for  E(XJ 

For  some  applications  it  can  be  useful  to  have  a  very  rough  idea  of 

the  size  of  E(X.)  .  It  is  possible  to  obtain  an  inequality  for  E(X  )  which 
1  i 

might  help  in  giving  an  estimate.  Define 


N  =  the  number  of  search  qualifiers  of  a  given  batch, 

which  describe  records  of  the  i-th  subfile. 
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Then  N  is  an  integer-valued  random  variable,  with  0  <  N.  <  L. .  Now 

1  ~  i  ~  i 


L 

i 

E (X.)  =  2  E(X.  I  N.  =  j)  •  P(N.  =  j) 

1  11  1 
j=0 


Now  it  has  been  shown  in  (2)  that 


hence 


E(X.  |  N.  =  j) 
i  l 


i 

j  +  1 


(L.  +  1) , 

l 


E  (X.) 

l 


L. 

i  j 

(L.  +  1)  •  2  — —  •  P  (N.  =  j) 

1  j=0  3  +  1 


Now  the  function 

g  (j ) 


_J _ 

j  +  1 


is  concave  on  the  interval  0  <  j  <  oo  ;  since  one  has  0  <  P(N.  =  j)  <  1 

Li 

for  each  j  (0  <  j  <  L.)  and  since  Sj-Q  =  j)  =  1  ,  it  follows  that 


2  g  (j)  •  P(N ,  =  j)  <  g  2  j  •  P(N,  =  j) 
j=0  1  \  j=0  1 

(see  reference  [  1  ]  for  example)  .  Now 

Li 

2  j  •  p(N.  =  j)  =  E(N.) 

•  o  1  1 


and  it  is  clear  that 

E(N.) 

i 


nL. 

] 

N 


thus 
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(12) 


For  example,  let  us  consider  the  case  which  was  treated  previously: 
N  =  1000,  L  =  100,  n  =  3  .  Then  by  (12), 


E(xi>  *  dfr  •  (101> 


giving 


E(X.)  <  23.308 

l  ~ 


This  may  be  compared  to  the  exact  answer  obtained  above,  E(X^)  =  14.174. 


Computation  for  moderately  large  n. 

According  to  Stirling's  formula. 


M  !  ~  V  2  tt  •  M  •  e 


(13) 


or 


log  (M!  )  ~  1/2  log  2tt  +  (M  +  1/2)  log  M  -  M  log  e. 


(14) 


for  any  integer  M  which  is  sufficiently  large.  The  relative  error  R  of 
Stirling's  formula  is  estimated  by  the  following  inequalities 


0  <  R  < 


1 


1  2n  -  1 


(15) 


so  that  if  M  >  9  then  the  relative  error  is  <  1%. 
If  N  is  sufficiently  larger  than  L  +  n,  say  if 

N  -  L.  -  n  >  10 
i  ~ 
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then  (N  -  L.  -  n  -  1)  !  in  formula  (10)  may  be  approximated  by  (13)  with 

l 

small  error,  and  indeed  all  the  factorials  in  (10)  may  be  so  approximated. 
It  will  usually  be  the  case  that  (16)  holds.  Rewriting  (10), 

log  K  =  log  [  (N  -  L  )  !]  +  log  [  (N  -  n)  !  ] 

-  log  (N  ! )  -  log  [  (N  -  L.  -  n  -  1)  !  ]  (17) 

Substituting  (14)  into  (17)  and  simplifying, 


(18) 


This  can  also  be  written  as 


f 

log  (N  -  L.)  + 

log  (N  -  n)  1 

log  K 

~  (N  +  1/2)  • 

(- 

1 

log  N  -  log  (N 

-  L.  -  n  -  1)J 

-  ‘(log  (N  - 

L.) 

1 

-  log  (N  -  L  - 

n  -  1)) 

-  n  •  (log  (N 

-  n) 

-  log  (N  -  L  - 

l 

-  n  -  1)) 

+  log  (N  -  L. 

-  n 

-  1)  -  log  e . 
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The  formula  (19)  is  superior  to  formula  (18)  for  actual  computation. 
Of  course  we  use  (19)  only  when  a  direct  computation  using  (9)  is  too 
difficult . 
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APPENDIX  II 


A  MATHEMATICAL  MODEL  OF  A  TIME -SHARING  SYSTEM 

[14] 

Scherr  has  constructed  a  mathematical  model  of  a  time -sharing 
system  having  n  consoles.  In  this  model,  each  console  is  regarded  as 
a  source  feeding  the  central  processor.  It  is  assumed  that  the  central 
processor  switches  from  console  to  console  rapidly  enough  that  one  may 
consider  that  it  is  processing  the  requests  from  each  console  simultane¬ 
ously;  and  that  the  central  processor  is  devoting  1/m  of  its  capability  to 
each  active  console,  if  m  is  the  number  of  active  consoles.  It  is  further 
assumed  that  if  a  particular  console  is  not  active,  then  the  time  interval 
between  now  and  when  it  becomes  active  has  an  exponential  distribution. 
The  amount  of  processing  required  for  a  request  from  a  console  is  assumed 
to  have  an  exponential  distribution.  We  can  visualize  Scherr's  model 
as  follows: 
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We  would  like  to  propose  another  model  for  a  time -sharing  system.  In 


this  model,  we  place  the  emphasis  not  on  the  consoles,  but  on  the  re¬ 
quests  arriving  at  the  central  processor;  this  model  may  be  visualized 
as  follows: 


Requests  are  assumed  to  originate  according  to  the  Poisson  distribution, 
at  a  rate  \.  It  is  assumed  that  the  central  processor  deals  with  all  re¬ 
quests  simultaneously  and  that  the  amount  of  processing  required  by 
requests  has  an  exponential  distribution.  Since  the  central  processor's 
efficiency  may  vary  depending  on  the  number  i  of  requests  being 
processed,  we  will  permit  the  processing  rate  to  be  dependent  on  i.  We 
define  to  be  the  rate  at  which  the  processor  serves  requests  if  there 
are  i  requests  being  processed.  A  little  more  precisely,  if  there  are  i 
requests  being  processed  and  no  more  requests  are  received,  then  it  is 
assumed  that  the  time  to  first  complete  service  has  an  exponential  dis¬ 
tribution  with  mean  p.  ^  .  Thus  ,  a.  is  the  rate  of  transition  to  the  state 

1  1 

where  there  are  i-1  requests  in  process;  each  individual  request  is  be¬ 
ing  processed  at  a  rate  p^/i  . 

Now  in  many  time -sharing  systems  ,  much  of  the  swapping  is  actually 
performed  by  peripheral  equipment;  in  such  systems  ,  the  central 
processor's  efficiency  may  not  depend  very  significantly  on  the  number 


of  requests.  Even  so,  it  is  conveni.ent  as  a  mathematical  device  to 
permit  p^  to  be  dependent  on  i,  as  will  be  discussed  later. 

This  model  may  be  analyzed  by  using  the  method  of  the  imbedded 
Markov  chain.  We  define  a  state  to  be  any  (maximal)  interval  of  time  dur¬ 
ing  which  the  number  of  requests  in  process  remains  constant.  One  state 
ends  and  another  begins  if  either  a  new  request  arrives  or  a  request 

finishes  processing.  Let  S  denote  the  number  of  requests  in  process 

n 

during  the  n-th  state.  Define: 


P,  =  P  (S  =  £)  ,  £  >  0. 

£  ,n  n  ~ 


Define:  tt„ 

£ 


lim 
=  n  *  oo 


that  is  the  tt,  are  the  limiting  state  probabilities .  Define  P  =  the 
probability  that  if  £  requests  are  in  process,  then  the  state  terminates 
with  the  arrival  of  a  new  request.  Define  =  the  probability  that  if  £ 
requests  are  in  process,  then  the  state  terminates  by  completing  the 
processing  of  one  of  these  requests.  One  has: 


\  +  p 


Q  = 


\  +  P( 


Of  course  P  +  Q„  =  1. 

£  £ 

Under  appropriate  conditions  ,  namely  that  the  capability  of  the 
central  processor  exceeds  the  requirements  of  the  incoming  requests  , 
the  Markov  chain  is  stable ,  and  the  are  determined  uniquely  by  the 
conditions: 
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*1  =  °«+l  '  *1*1  +  Vl  •  *1-1  :  1 


”0  =  °1  '  *1 


CO 

2  7T  =  1  . 

n=0  n 


Now  it  is  one  thing  to  say  that  the  tt^  are  uniquely  determined,  and  it  is 
quite  another  to  actually  compute  the  n  .  One  may  proceed  as  follows: 
define  numbers  x  recursively  by: 


x. 


=  l  ; 


=  Q. 


x,  =  Q, 


-  P 


i  >  2, 


<-l  1-2  £  -2  /  - 


If  we  define: 


K 


oo 

2  X< 

£=0 


then  the  x.  are  related  to  the  tt„  by: 

£  l  1 

tt„  =  x  •  K 


Thus  it  is  all  a  matter  of  computing  K.  If  we  compute  x^,  . 
N  is  large,  then  approximately: 


•  /  /  where 


K 


N 

=  2  xi 

i=0 
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Therefore  the  key  to  the  whole  problem  is  to  decide  how  large  N  must  be 


in  order  to  be  able  to  use  the  above  approximation.  While  we  shall  not 
enlarge  further  on  this  problem  here,  we  will  remark  that  for  all  cases  of 
interest  the  problem  is  soluble,  and  therefore  the  are  computable. 
Further,  in  many  cases  the  tt  can  be  computed  exactly. 

Let  denote  the  average  duration  of  a  state  during  which  there  are 
i  requests  being  processed;  then: 

1 

'  0  + 

Let  tt  denote  the  probability  that  at  time  t  there  are  i  requests  being 
processed,  as  t  -oo.  Then  it  may  be  shown  that: 

wt  =  2X  tj  '  nt 
Thus  the  tF.  may  be  computed. 

As  an  example,  let  us  consider  the  case  where  p.  is  independent  of 
i;  thus  there  is  a  p  such  that: 

P.  =  p  /  i_>  0 . 

It  is  not  difficult  to  see  that  the  equal  the  state  probabilities  for  a 
queue  with  single  server,  Poisson  input,  exponential  service,  so  that  in 
fact  the  problem  is  already  solved.  However,  one  can  carry  through  the 
computations  outlined  above;  the  equations  for  the  tt  may  be  solved 
in  closed  form,  giving: 
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tL-zA. 

2M- 


TT. 


u  +  \ 


1-1 

J+T 


(n  -X) 


The  tr 


i  are  then: 


IT  , 


=  (-)  (1  -  —  ) 


in  agreement  with  results  for  the  queueing  problem. 

The  state  probabilities  If  give  us  a  measure  of  information  on  the 
length  of  time  a  particular  request  may  take.  For  example,  suppose  the 
efficiencies  p.  are  independent  of  i  ,  so  that: 


u . 


=  |i  ,  l  >  0. 


Now  suppose  we  enter  a  request  which  requires  much  processing.  If  the 
request  stays  in  the  system  for  a  long  time,  then  we  may  consider  that, 
as  far  as  other  requests  are  concerned,  the  primary  effect  of  the  first  re¬ 
quest  is  to  reduce  the  processing  capability  of  the  system.  Thus  in  effect 
we  have  a  system  in  which: 


ui  =  77 T 

If  we  compute  the  corresponding  state  probabilities  Tt  ,  and  if  the  amount 
of  time  required  to  process  the  request  is  T  (assuming  that  the  central 
processor  works  only  on  this  request)  then  the  expected  time  to  process 
the  request  is  approximately: 
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T 


00 

2  OF  /€+  1) 

f=0 

From  this  we  see  that  permitting  p.  to  be  dependent  on  i  can  be  a  useful 
mathematical  device,  even  if  there  is  no  significant  dependence  in  the 
actual  system  being  considered. 
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APPENDIX  III 


PERSONNEL  ASSIGNMENT  ALGORITHMS 


The  following  discussion  of  the  personnel-assignment  problem  is 

limited  to  three  techniques.  Each  method  assumes  that  a  matrix  of  values 

indicating  the  utility  of  person  i  on  job  j  is  available.  In  addition, 

9 

these  values  are  assumed  to  be  error  free. 


1 .  Decision  Index 

This  technique  assigns  "personnel  to  jobs  in  a  way  that  will  tend  to 
maximize  their  productivity".  This  technique  does  not  guarantee  an  op¬ 
timal  policy  in  assigning  personnel.  The  procedure  consists  of  defining 
a  Disposition  or  Decision  Index  (DI) ,  which  alters  each  element  of  the 
value  matrix  (C  )  in  a  designated  manner.  The  two  DI  suggested  in  foot- 

pq 

note  2  for  the  case  of  n  persons  and  n  jobs  and  m  persons  and  n  jobs 
(m*n)  are,  respectively: 


DI 

pq 


DI 

pq 


l 

n(n-l) 

1 

n(m-l) 


nC  -C  -C  +  C  . . 
pq  P-  .q 


mC  -C  -C  +  C  .  . 

pq  P .  . q 


(1) 

(2) 


For  a  discussion  of  the  personnel-assignment  problem  when  the  errors 
in  these  values  are  considered,  see  reference  [16  ]  . 

^ For  more  detailed  information,  see  reference  [  11]  . 
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where , 


n  or  m  n 

C.  .  =  2  2  C 

P-l  q=l  Pq 

n 

C  =  2  C 

P-  q»l  PP 

n  or  m 

C  =  2  C 

•q  p-l  Pq 

C  =  productivity  of  the  p™  person  on  the  q™  job. 

pq 

Once  the  DI  have  been  calculated  for  each  element  of  the  matrix, 
several  techniques  are  available  to  obtain  the  personnel  assignments. 

A.  The  first  procedure  is  to  compute  the  DI  for  each  element  of 
the  matrix  and  then  select  the  element  with  the  highest  DI  as  the  first 
assignment.  Delete  that  row  and  column  from  the  original  value  matrix 
and  recalculate  a  new  set  of  DI .  Select  the  element  with  the  highest  DI 
as  the  second  assignment.  Continue  this  procedure  until  all  personnel 
(or  jobs)  have  been  assigned. 

B.  The  second  procedure  is  to  calculate  the  original  DI  and  se¬ 
lect  the  highest  DI  for  assignment.  Select  the  second  highest  for  the 
second  assignment  and  continue  making  assignments  in  descending  DI 
order  without  recomputing  a  DI  matrix. 
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C.  The  third  procedure  consists  of  a  mixture  of  technique  A 
and  technique  B;  that  is  ,  recomputing  the  DI  matrix  after  every  K™ 
assignment . 

The  advantage  of  the  Decision  Index  Technique  is  that  the  computa¬ 
tions  are  relatively  simple  and  in  addition,  the  logic  required  in  assign¬ 
ing  personnel  is  simple.  The  disadvantage  of  this  technique  is  that  the 
personnel  assignment  policy  is  not  always  the  optimal  policy. 


The  time  required  to  calculate  a  100  x  100  matrix,  if  technique  A  is 

used,  can  be  estimated  as  follows. 

2  2 
First  assignment  required  n  =  10,000  DI  computations  and  n  = 

10,000  comparisons  to  select  the  maximum  DI . 

2  2  2  2 

Total  number  of  DI  computations  =  n  +  (n  —  1)  +  .  .  .  +  3  +2 

=  total  number  of  comparisons. 


Number  DI  computations  = 


n(n+  1)  (2n+  1) 

6 


-  1 


100(1011(201)  _ 


=  376,699. 


Depending  on  the  computer  selected,  an  estimate  of  the  time  re¬ 
quired  to  solve  a  100  x  100  matrix  can  now  be  calculated.  The  required 

2 

storage  for  the  technique  is  2n  . 

T  121 

2 .  The  Hungarian  Method 

The  theoretical  justification  for  this  method  is  presented  in  refer¬ 
ence  [12  ]  .  This  method  requires  an  n  x  n  value  matrix  of  positive 
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integers  .  An  initial  cover  and  an  initial  set  of  independent  marks  must 


first  be  defined  before  entering  the  two  main  routines. 


Let, 

a. 

i 

= 

max  r_  for  i  =  1 ,  2 ,  . . .  ,  n  (maximum  for  each  row) 

b 

_ 

max  r. .  for  j  =  1 ,  2 ,  .  .  .  ,  n  (maximum  for  each  column) 

i 

iJ 

where 

r. . 
iJ 

= 

the  utility  of  person  i  on  job  j 

n 

n 

a  = 

2 

a.  b  =  2  b. 

i=l 

1  i=l  1 

Then  define  values  u,  and  V,  (cover)  as  follows: 

i  i 


If  a  <  b  define 


for  i  =  1 ,  . .  .  ,  n 
for  j  =  1 ,  . . .  ,  n 


If  a  >  b  define 


Uj  =  0  for  i  =  1 ,  2 ,  .  .  .  ,  n 


1  V.  =  b. 
)  ) 


for  j  =  1 ,  2 ,  .  .  .  ,  n 


From  the  cover  u. ,  V.  )  and  the  value  matrix  R  =  (r. .)  ,  a  qualifica- 
i  J  ij 


tion  matrix  Q  =  (q_)  is  defined, 


qt! 


1  if  u.  +  V,  =  r.. 

i  J  1J 

0  otherwise 


The  independent  marks  for  this  initial  set  are  defined  as  follows: 
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1.  If  a  <.b,  the  rows  are  examined  in  order  and  the  first 
1  in  each  row  without  an  independent  mark  (desig¬ 
nated  as  1*)  in  its  column  is  changed  to  a  1*. 


2.  If  a  >  b,  the  rows  and  columns  in  statement  1  are 
interchanged. 

The  relationship  between  the  two  routines^of  this  technique  are 
shown  below . 


Since  the  "Hungarian  Method"  is  an  iterative  process,  an  estimate 
of  the  time  required  to  obtain  a  solution  to  an  n  x  n  matrix  is  difficult. 

H  For  more  detailed  information,  see  reference  [  12]  . 
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This  method  can  be  programmed  on  a  small  computer  readily,  and  the 
relation  between  n  and  the  mean  processing  time  can  be  obtained.  It 
is  important  to  observe  two  important  outcomes  of  the  iterative  method. 

1)  Every  occurence  of  la  increases  the  number  of 
assignments  (1*)  by  one  , 

2)  Every  occurrence  of  Ha  decreases  the  current  cover¬ 
ing  sum  (  S  u^  +  2  v  ) 

which  assures  a  solution. 

[  13] 

3 .  Linear  Programming1 

The  personnel-assignment  problem  can  be  reformulated  into  a 
classical  linear  programming  problem,  the  transportation  problem.  This 
problem  has  been  studied  extensively  and  optimal  solutions  do  exist.  A 
description  of  the  technique  is  contained  in  reference  [  13]  ,  chapters  14 
and  15,  and  shall  not  be  repeated  here. 

Estimates  of  the  computing  time  required  to  solve  the  assignment 
problem  by  linear  programming  can  most  likely  be  obtained  with  a  more 
extensive  literature  search.  Linear  programs  are  now  available  at  many 
research  centers  and  time  estimates  should  be  available.  The  advan¬ 
tages  of  this  technique  are  that  optimal  solutions  are  obtained  and  the 
technique  is  universally  known. 
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